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ABSTRACT 



Context. A number of large spectroscopic surveys of stars in the Milky Way are under way or are being planned. In this context it is 

important to discuss the extent to which elemental abundances can be used as discriminators between diiferent (known and unknown) 

stellar populations in the Milky Way. 

Aims. We aim to establish the requirements in terms of precision in elemental abundances, as derived from spectroscopic surveys of 

the Milky Way's stellar populations, in order to detect interesting substructures in elemental abundance space. 

Methods. We used Monte Carlo simulations to examine under which conditions substructures in elemental abundance space can 

realistically be detected. 

Results. We present a simple relation between the minimum number of stars needed to detect a given substructure and the precision of 

the measurements. The results are in agreement with recent small- and large-scale studies, with high and low precision, respectively. 

Conclusions. Large-number statistics cannot fully compensate for low precision in the abundance measurements. Each survey should 

carefully evaluate what the main science drivers are for the survey and ensure that the chosen observational strategy will result in the 

precision necessary to answer the questions posed. 

Key words. Methods: data analysis - Methods: statistical - Stars: abundances - Galaxy: abundances 



1. Introduction 

In Galactic archaeology stars are used as time capsules. The 
outer layers of their atmospheres, accessible to spectroscopic 
studies, are presumed to keep a fair representation of the mixture 
of elements present in the gas cloud out of which they formed. 
By determining the elemental abundances in the stellar photo- 
spheres, and combining with kinematics and age information, it 
is possible to piece together the history of the Milky Way (e.g., 
[Freeman & Bl and-Hawthor n|2002| l . 

|Edvardsson et al.| ( |1993| l were among the first to demonstrate 
that it is possible to achieve high precision in studies of ele- 
mental abundances for large samples of long-lived dwarf stars. 
In their study of 189 nearby dwarf stars they achieved a preci- 
siorPlbetter than 0.05 dex for many elements, including iron and 
nickel. More recent studies have achieved similar or even better 
results. |Nissen & Schuster] ( |2010 ), in a study of dwarf stars in 
the halo, obtain a precision of 0.02-0.04 dex for a-elements rel- 
ative to iron, and 0.01 dex for nickel relative to iron. In studies of 
solar twins (i.e., stars whose stellar parameters, including metal- 
licity, closely match those of the sun) Melendez et al. ( 2012] l are 
able to achieve a precision better than 0.01 dex. At the same time 
several studies have found that in the solar neighbourhood there 
exist substructures in the elemental abundance trends with dif- 



ferences as large as 0.1 to 0.2 dex (e.g. Fuhrmann 1998,iBensby 
let al.|2004[[Nissen & Schuster|2010| l. 



Driven both by technological advances and the need for 
ground-based observations to complement and follow up the ex- 
pected observations from the Gaia satellite. Galactic astronomy 
is entering a new regime where elemental abundances are de- 
rived for very large samples of stars. Dedicated survey telescopes 
and large surveys using existing telescopes have already moved 
Galactic astronomy into the era of large spectroscopic surveys 
(jZwitter et al.| |20^ priny et al.||2009[ [Majewski et ai:i[20T0 



' The term 'precision' refers to the ability of a method to give the 
same result from repeated measurements (observations). See Sect.Hland 
footnotelslfor the distinction between precision and accuracy. 



Gilmore et al. 2012 1. With the new surveys, several hundred 
thousands of stars will be observed for each stellar component 
of the Galaxy. For all of these stars we will have elemental abun- 
dances as well as kinematics and, when feasible, ages. One goal 
for these studies is to quantify the extent to which the differences 
in elemental abundances seen in the solar neighbourhood extend 
to other parts of the stellar disk(s) and halo, and to identify other 
(as yet unknown) components that may exist here and elsewhere. 

Large-scale surveys naturally tend to have lower signal-to- 
noise ratios for the individual stars than can be achieved in the 
classical studies of small stellar samples in the solar neighbour- 
hood. On the other hand, the very large number of stars reached 
with the new surveys will at least partly compensate for a lower 
precision per object. A relevant question is thus: How many stars 
do we need to detect a certain abundance signature of Y dex, 
when we have a precision of Z dex in the individual abundance 
determinations? This is what we explore in this Research Note. 

This Research Note is structured as follows: Section |2] sets 
out the problem which is then investigated in Sect. |3] In Sect. |4] 
we discuss what accuracies and precisions have been shown to 
be possible and what is feasible to expect from large scale sur- 
veys. Section|5]contains some concluding remarks. 



L. Lindegren and S. Feltzing: The case for high precision in elemental abundances (RN) 



0.6 

& 0.2 
BO n 

-0.2 

-0.4 

0.6 

„ 0-4 
^ 0.2 

1 
-0.2 

-0.4 

0.6 

0.4 

S. 0.2 

00 n 

-0.2 
-0.4 



• •••. :'• --^i'/y^ 



■r*': 



: a) 

H — I — ^ — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — \ — I — I — I — I- 






b) 



H — I — \ — I — — I — I — I — I — — I — I — I — I — — I — I — I — I — — I — I — I — I- 



0.25 dex 



-1.5 



J 1 1 1 1 I i_ 

-1 -0.5 

[Fe/H] 



c) 



0.5 



Fig. 1. Illustration of the problem, showing Fe and Mg abun- 
dances for stars in the solar neighbourhood, a Based on data 
by Fuhrmann (see text for references). At each value of [Fe/H] 
the stars fall into two groups with distinctly different [Mg/Fe]. 
b Based on data for stars with halo velocities from lNissen &I 



Schuster ( 2010 1. The two lines, drawn by hand, illustrate the sep 
aration in high- and low-o- stars identified by'Nissen & Schuster 



(pOlOJ. c Illustration of the generic problem treated here. 



2. Defining the problem 

Elemental abundances derived from stellar spectra with high res- 
olution and high signal-to-noise ratios have shown that the stars 
in the Milky Way and in the nearby dwarf spheroidal galax- 
ies have a range of elemental abundances (see, e.g., [Tolstoy 
|et al.|[2009l ). Not only do the stars span many orders of mag- 



nitude in iron abundances ([Fe/Hrji they also show, subtler, dif- 
ferences in relative abundance. One of the most well-known ex- 
amples is given by the solar neighbourhood, where for example 
|Fuhrmann| (fr998 2000, 2002 2004 2008 2011 ) shows from a 
basically volume limited sample that there are two abundance 
trends present. One trend has high [Mg/Fe] and one with low, al- 
most solar, [Mg/H]. Figure [Tk reproduces his results. The basic 
result, i.e., that there is a split in the abundance trends was fore- 
shadowed by several studies (e.g., |Edvardsson et al.||1993[) and 
has been reproduced by a number of studies since (e.g., 'Reddy' 
eTal . 2003; B ensby et al. 2004, 2005 ; Red dy et al.|2 006 ; Neves 
et al. 2009; Adibekyan et al. 2012). Another well-known exam- 
ple in the solar neighbourhood is the split in ff-elements as well 



as in Na and Ni for stars with typical halo kinematics ( Nissen & 
|Schuster|2010| and Fig.[T]5). The differences in elemental abun- 
dances between these different populations can be as large as 
0.2 dex, but often they are smaller. 

Figure [T]; illustrates the highly simplified case considered in 
the present study, namely that the observed stars belong to two 
populations that differ in some abundance ratio [X/Fe] by a cer- 
tain amount. In the figure the difference is taken to be 0.25 dex, 
which as we have seen may be representative of actual abun- 



^ We use the standard notation for elemental abundances where 

[Fe/H] = l0g(A'Fe/A'H). - log(yVFe/iVH)0. 



dance differences. We will investigate whether it is possible to 
distinguish the two populations depending on the number of 
stars considered and the precision of the individual [X/Fe] mea- 
surements. This will allow us to derive a lower limit for the pre- 
cision needed to probe abundance trends such as those shown 
in Fig. [T] We emphasize that the objective is to identify such 
substructures in elemental abundance space without a priori cat- 
egorization of the stars, e.g., in terms of kinematic populations. 

3. Investigation 

The problem is formulated as a classical hypothesis test. 
Although hypothesis testing is a well-known technique, and the 
present application follows standard methodology, we describe 
our assumptions and calculations in some detail in order to pro- 
vide a good theoretical framework for the subsequent discussion. 

Consider a sample of A^ stars for which measurements x,, 
/ = 1, . . . , A^ of some quantity X (e.g., [Mg/Fe]) have been made 
with uniform precision. The null hypothesis Hq is that there is 
just a single population with fixed but unknown mean abun- 
dance fi (but possibly with some intrinsic scatter, assumed to be 
Gaussian). Assuming that the measurement errors are unbiased 
and Gaussian, the values x, are thus expected to scatter around jj 
with some dispersion cr which is essentially unknown because it 
includes the internal scatter as well as the measurement errors. 
The alternative hypothesis //^ is that the stars are drawn from 
two distinct and equally large populations, with mean values fxi 
and fi2, respectively, but otherwise similar properties. In particu- 
lar, the intrinsic scatter in each population is the same as in Hq, 
and the measurement errors are also the same. Without loss of 
generality we may take yU = in Hq, and jUi_2 = +rcr/2 in Ha, so 
that the populations are separated by r > standard deviations 
in Ha, and by r = in Hq. The only relevant quantities to con- 
sider are then the (dimensionless) separation r > and the total 
size of the sample A^. 

The possibility to distinguish the two populations in Ha de- 
pends both on r and A^. Clearly, if r is large (say > 5) the two 
populations will show up as distinct even for small samples (say 
A^ = 100 stars). For smaller r it may still be possible to distin- 
guish the populations if A^ is large enough. Exactly how large A^ 
must be for a given r is what we want to find out. Conversely, 
for a given A^ this will also show the minimum r that can be dis- 
tinguished. Given the true separation in logarithmic abundance 
(dex), this in turn sets an upper limit on the standard error of the 
abundance measurements. 

The two simulated samples in Fig. |2] illustrate the situation 
for A^ = 1000. In the top diagram (generated with r - 2.0) it is 
not possible to conclude that there are two populations, while in 
the bottom one (for r -2.4) they are rather clearly distinguished. 

Given the data x - (xi, X2, ■ ■ ., x;v) we now compute a test 
statistic t{x) quantifying how much the data deviate from the 
distribution assumed under the null hypothesis, i.e., in this case 
a Gaussian with mean value /i and standard deviation cr (both 
of which must be estimated from the data). A large value of t 
indicates that the data do not follow this distribution. The null 
hypothesis is consequently rejected if t{x) exceeds some critical 
value C, chosen such that the probability of falsely rejecting Hq 
is some suitably small number, say a = 0.01 (the significance of 
the test). 

It should be noted that Hq and Ha are not complementary, 
i.e., if //() is rejected it does not automatically follow that Ha 
should be accepted. Indeed, there are obviously many possible 
distributions of X that cannot be described by either Hq or Ha- 
Having rejected Hq, the next logical step is to test whether Ha 
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Fig. 2. The top histogram shows a simulated sample of size 
A' = 1000 drawn from a superposition of two Gaussian distribu- 
tions separated by r = 2.0 standard deviations. The solid curve 
is the best-fitting single Gaussian. In this case the null hypoth- 
esis, that the sample was drawn from a single Gaussian, cannot 
be rejected. The bottom histogram shows a simulation with sep- 
aration r - 2.4 standard deviations. The solid curve is again 
the best-fitting single Gaussian. In this case the null hypothesis 
would be rejected and a much better fit could be obtained by 
fitting two Gaussians (not shown). 



where Fat is the empirical distribution function for the given data 
(i.e., Fn{x; x) = n(x)/N, where n(x) is the number of data points 
< x) and F{x; //, cr) is the normal cumulative distribution func- 
tion for mean value // and standard deviation cr. The expression 
in Eq. (fill requires some explanation. The quantity obtained as 
the maximum of the absolute difference between the two cumu- 
lative distributions is the distance measure D used in the standard 
one-sample K-S test. This D is however a function of the param- 
eters of the theoretical distribution, in this case fi and cr, and we 
therefore adjust these parameters to give the minimum D. This is 
multiplied by Va^ to make the distribution of t under Hq nearly 
independent of A^, and to avoid inconveniently small values of D 
for large samples. 

The distribution of t{x) for given A^ and r must be determined 
through Monte-Carlo simulations, in which many independent 
realizations of x are generated and t computed for each of them 
by application of Eq. ([l}|j We give results for some selected 
combinations of (A^, r) in Fig. l3] Each curve in these diagrams 
shows the fraction of f-values exceeding C in a simulation with 
2000 realizations of a:. The fractions are plotted on a non-linear 
vertical scale (using a log[P/(l - P)] transformation) in order to 
highlight both tails of the distribution. The wiggles in the upper 
and lower parts of the curves are caused by the number statistics 
due to the limited number of realizations. 

For a given value of C, the significance of the test, i.e., the 
probability of falsely rejecting //q ("Type I eiTor"), can be di- 
rectly read off the solid curves in Fig. [3] as a - Pit > C\ r - 0). 
Conversely, we can determine the C-value to be used for a given 
significance level. Adopting a relatively conservative a - 0.01 
we find that C - 0.7 can be used for any sample size. For r > 
the dashed curves give the power \ - fi of the test, where/? is the 
probability of a "Type II eiTor", i.e., of failing to reject Hq when 
Ha is true. For example, if we require 1-/3 > 0.99 at C = 0.7, the 
minimum r that is detected with this high degree of probability 
is about 4.2, 2.5, and 1.7 for the sample sizes shown in Fig. [5] 
For the two specific examples in Fig. l2]the computed statistic 
is f = 0.41 (top) and 1.04 (bottom), meaning that Ho would be 
rejected at the 1 % significance level in the latter case, but not in 
the former. 

Results are summarized in Fig. HI which shows the minimum 
sample size as a function of r for the assumptions described 
above. The circles are the results of the Monte-Carlo simula- 
tions for Q- = and I - p - 0.99, obtained by interpolating in 
Fig. [3] and the corresponding diagrams for A^ = 30, 300, 3000, 
and 30 000. The curve is the fitted function 



lnA^-0.6 

13 



-1.25 



provides a reasonable explanation of the data, or if that hypoth- A^j-,,;^ ^ exp (o.6 + 13r""'^) 
esis, too, has to be rejected. However, since we are specifically 
interested in detecting substructures in the distribution of X, of 
which Ha provides the simplest possible example, it is very rele- 
vant to examine how powerful the chosen test is in rejecting Hq, 
when Ha is true, as a function of A^ and r. 

The test statistic t{x) measures the "distance" of the data 
from the best-fitting normal (Gaussian) distribution with free 
parameters fi and cr. Numerous tests for "normality" exist, but 
many of them are quite sensitive to outliers (indeed, some are 
constructed to detect outliers) and therefore unsuitable for our 
application. Instead we make use of the distance measure D 
from the well-known Kolmogorov-Smimov (K-S) one-sample 
test (Press et al. 2007), which is relatively insensitive to outliers 
and readily adapted to non-Gaussian distributions, if needed. We 
define 



(2) 



This function, which has no theoretical foundation and therefore 
should not be used outside of the experimental range (30 < N < 
30 000), can be inverted to give the minimum separation for a 
given sample size: 



(3) 



t(x) - Va^ X min max |f^(x; x) - F(x; fi, cr) 



(1) 



The distribution of / under Hq does not follow the theoretical dis- 
tribution of Dy/N usually given for the K-S test, i.e., P(t > C) = 
QKslJi + 0.12/ VA f -I- 0.11/A')C], where the function Qks is given in 
IPress et aL] pOOTl. This distribution, shown as a dotted curve in the 
bottom diagram of Fig. [3] is clearly very different from the empirical 
distribution for r = given by the solid curve in the same diagram. The 
reason is that the K-S test assumes that the comparison is made with 
a fixed distribution F{x). In our case we adjust fi and cr to minimize 
D yfN, which results in a different distribution. 
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Fig. 3. Examples of probability plots for the test statistic t{x) ob- 
tained in Monte-Carlo simulations for sample sizes A^ = 10^, 
lO"*, and 10^ (top to bottom). In each diagram the solid curve 
shows, as a function of the critical value C, the probability that t 
exceeds C under the null hypothesis (r = 0). The dashed curves 
show the probabilities under the alternative hypothesis (r > 0) 
for the r-values indicated in the legend. In the bottom diagram 
the dotted curve gives, for comparison, the expected distribution 
of D y/N for a one-sample K-S test in which F is the true distri- 
bution (without adjusting // and cr); see footnote l3] 



For example, if the populations are separated by 5 times the 
measurement eiTor (r - 5), the populations could be separated 
already for A^ ^ 70. For r = 3 the minimum sample size is 
A^ - 400, and for r = 2 it is A^ = 3000. Clearly, if the separation 
is about the same as the measurement errors (r - 1), the situa- 




3 4 5 

Separation r (in units of o) 

Fig. 4. Minimum sample size needed to distinguish two equal 
Gaussian populations, as a function of the separation of the pop- 
ulation mean in units of the standard deviation of each popula- 
tion. The circles are the results from Monte-Carlo simulations 
as described in the text, using a K-S type test with significance 
level a - 0.01 and power 1 -/? = 0.99. The curve is the fitted 
function in Eq. (|2| or Q. 



tion is virtually hopeless even if the sample includes hundreds of 
thousands of stars. 

It should be remembered that these results were obtained 
with a very specific set of assumptions, including: (1) measure- 
ment errors (and/or internal scatter) that are purely Gaussian; (2) 
that the two populations in the alternative hypothesis are equally 
large; (3) the use of the particular statistic in Eq. (fTh; and (4) the 
choice of significance (a probability of falsely rejecting Hq less 
than a = 0.01) and power (a probability of correctly rejecting 
Hq greater than 1 - y6 = 0.99). Changing any of these assump- 
tions would result in a different relatiorOfrom the one shown in 
Fig. HI Nevertheless, this investigation already indicates how far 
we can go in replacing spectroscopic resolution and signal-to- 
noise ratios (i.e., small measurement errors) with large-number 
statistics. In particular when we consider that real data are never 
as clean, nor the expected abundance patterns as simple as as- 
sumed here, our estimates must be regarded as lower bounds to 
what can realistically be achieved. 



4. Accuracy and precision in steiiar abundances 

We have no knowledge a priori of the properties of a star and no 
experiment to manipulate in the laboratory but can only observe 
the emitted radiation and from that infer the stellar properties. 
Therefore the accurac}]^ of elemental abundances in stars is of- 
ten hard to ascertain as it depends on a number of physical effects 
and properties that are not always well-known, well-determined, 
or well-studied (Baschek_1991) . Important examples of relevant 

'^ Experiments with unequally large populations in Ha suggest that 
the power of the test is not overly sensitive to this assumption, as long 
as there is a fair number of stars from each population in the sample. 

' 'Accuracy' refers to the capability of a method to return the correct 
result of a measurement, in contrast to precision which only implies 
agreement between the results of different measurements. It is possible 
to have high precision but poor accuracy, as is often the case in astron- 
omy. For the purpose of the study of trends in elemental abundances in 
the Milky Way both are important, but for practical reasons most studies 
are concerned with precision rather than accuracy. 
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Fig. 5. Updated plot for HD 140283 from|Gustaftson|(|2004|. On 
the y axis are shown the [Fe/H] values determined from spec- 
troscopy as reported in the literature. The x axis gives the year 
of the publication. The error bars indicate (full lines) the line-to- 
line scatter (when available) and (dotted lines) a few examples 
of attempts to address the full error including errors in the stel- 
lar parameters. The circles (o, at 1996) refer to a single analysis 
using three different temperature scales and the cross (x) refers 
to an analysis of a high-resolution spectrum using the SEGUE 



pipeline (Lee et al.|2011 see also Sect. fflfor database usage). 




Lee et al. 



(2008|). Weplot 



Fig. 6. This figure shows results from 
the difference between the final abundance adopted in [Lee et aL 
(J2008 ) and the abundance derived using one of eleven methods 
(as described in |Lee et aL] [2008 ) as a function of the signal-to 
noise ratio. The error bar shows the scatter for each method. 



effects include deviations from local thermodynamic equilib- 
rium (NLTE) and deviations from ID geometry ( Asplund 2005] 
Heiter & Eriksson]|2006) . Additionally, systematic and random 
errors in the stellar parameters will further decrease the accuracy 
as well as the precision within a study. 

An interesting example of the slow convergence of the de- 
rived iron abundance in spite of increasing precision is given 
in |Gustafsson| ( |2004| l, where he compares literature results for 



the well studied metal-poor sub-giant HD 142083. Over time the 
error-bars resulting from line-to-line scatter decreases thanks to 
increased wavelength coverage (i.e., more Fei lines are used 
in the analysis) and higher signal-to-noise ratios. However, 
the differences between studies remain large. An updated and 
augmented version of the plot in Gustafsson ( 2004[ l is given 
in Fig. |5] Data were sourced using SIMBAD and the SAGA 
database (Suda et al. 2008) . For data listed in the SAGA database 
we excluded all non-unique data, e.g., where a value for [Fe/H] 
is quoted but that value is not determined in the study in ques- 
tion but taken from a previous study. The error bars shown are 
measures of the precision based on the quality of the spectra and 
reflect the line-to-line scatter Generally, the precision has clearly 
improved with time, but, judging from the scatter between dif- 
ferent determinations, it is doubtful if the overall accuracy has 
improved much. From around 1995 most studies quote a preci- 
sion from measurement errors and errors in log gf values alone 
of 0. 1 dex. From about the same time there appears also to be 
a convergence on two different [Fe/H] values. The difference is 
mainly related to a high and a low value of logg, whilst Teff 
appears uncorrelated with this split in [Fe/H] values. This illus- 
trates the need for homogeneous samples treated in the same 
consistent way if substructures should be detected. Combining 
data from many different studies may in fact create unphysical 
structures in abundance space. 

An example of a homogeneous treatment of a large number 
of stars is the SEGUE surveyF] An interesting illustration of the 
(inherent?) difficulties in reaching accurate results is given by 
the first paper on the SSPS pipeline used to analyse the SEGUE 
spectra (Lee et al. 2008). The pipeline implements eleven meth- 
ods to derive iron abundances. Figure[6]summarizes the resulting 
differences between the adopted iron abundance and those de- 
rived using the eleven different methods as a function of signal- 
to-noise ratios in the stellar spectra. Most of the methods con- 
verge towards the adopted value around a signal-to-noise ratio 
of about 25, where the typical scatter for any of the methods is 
about 0.1 dex. However, there are methods that give iron abun- 
dances that deviate systematically by a similar amount also at 
higher signal-to-noise ratios, even though in this case the under- 
lying assumptions are quite uniform. Thus this comparison sug- 
gests that the precision (as judged from the scatter of individual 
methods) is about 0. 1 dex, and that systematic errors could be at 
least as large. 

It is possible to access the precision in the derived elemen- 
tal abundances with a full forward modelling of the analysis of 
a stellar spectrum. This can be done as a preparatory step for 



instrument designs. A recent example is given by Caffau et al. 
('2013) who ran model spectra through a simulator built to re- 
semble the 4MOST multi-object spectrograph for VISTA (del 
Jong et al.|2012[ l. The simulator includes a transmission model 
of the Earth's atmosphere, a model for the seeing and sky back- 
ground, and a simple model of the instrument. They found that 
they could reproduce the input abundance ratios with a precision 
of 0. 1 dex for most elements and 0.2 dex for some elements. 

We note with some interest that Eq. ( 3]) fits results from re- 
cent works in the literature. For examp e, [Nissen & Schuster 



(201 0) used about 100 stars in their study, and r^a according 
to Eq. (pi is thus 4.4. Since their quoted precision is 0.04 dex for 
[Mg/Fe], the difference of about 0.2 dex seen in Fig.fTlis com- 
patible with the prediction from Sect. [3] that the minimum dis- 



* SEGUE is the Sloan Extension for Galactic Understanding and 
Exploration to map the structure and stellar makeup of the Milky Way 
Galaxy using the 2.5 m telescope at Apache point tYanny et al.T2009jl. 
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cernible difference should be about r,nin x o" = 0.17 dex. A simi- 
lar comparison can be made for the results from SEGUE, e.g., as 
reported in Lee et al. (2011). With 17000 stars r^^in is 1.6. The 
quoted precision is no more than 0. 1 dex in [ff/Fe] which leads to 
'"min X cr = 0.16 dex. Figure [T] shows that the difference between 
the thin and the thick disk in the solar neighbourhood may be 
as large as 0.2 dex, hence the SEGUE spectra should be able to 
detect the difference between the two disks. |Lee et al.| ( [2011 ) do 
indeed find a clearly bimodal distribution in [a/Fe], although it 
may be less visible once the data are corrected for selection ef- 
fects ( Bovy et al. 2012 1. We note that even if the precision would 



be somewhat worse the situation is still good. These two studies 
nicely illustrate the trade-off between high precision and high 
numbers of stars. It also illustrates that our formula in Eq. ^ is 
a good representation of actual cases and can be used for deci- 
sion making when planning a large survey or a small study. 

A differential study is the best way to reach high precision 
(e.g., |Gustafsson|[2004l [Baschek|199l"] |Magain.l984 ). One im- 
portant aspect in the differential analysis is that measurement 
errors or erroneous theoretical calculation for log gf/-values be- 
come irrelevant. The power of differential analysis has been am- 
ply exemphfied over the past decades (e.g., Edvardsson et al. 



T993||Bensby et al.|2004[|Nissen & Schuster,2010i). A very re- 



cent example are the studies of solar twins ( [JVIelendez et al" 2012 
who reached precisions of <0.01 dex). Such precision is possible 
because they study solar twins - all the stars have very similar 
stellar parameters. This means that erroneous treatment of the 
stellar photosphere and the radiative transport, as well as erro- 
neous log gr/-values, cancel out to first order. This "trick" can 
be repeated for any type of star and has, e.g., been successfully 
applied to metal-poor dwarf stars (Magain 1984 , Nissen et al.. 
[25521 [Nissen & Schuster]20T0l l. 

Most large studies must by necessity mix stars with different 
stellar parameters. However, in future large spectroscopic sur- 
veys it will be feasible, both at the survey design stage and in the 
interpretation of the data, to select and focus on stars with sim- 
ilar stellar parameters. Those smaller, but more precise stellar 
samples will yield more information on potential substructures 
in elemental abundance space than would be the case if all stars 
were lumped together in order up the number statistics. 

5. Concluding remarks 

With the advent of Gaia, the exploration of the Milky Way as a 
galaxy will take a quantum leap forward. We will be working in 
a completely new regime - that of Galactic precision astronomy. 
Gaia is concentrating on providing the best possible distances 
and proper motions for a billion objects across the Milky Way 
and beyond. For stars brighter than 17th magnitude radial veloc- 
ities will also be supplied. However, for fainter stars no radial 
velocities will be obtained and thus no complete velocity vec- 
tor will be available. No detailed elemental abundances will be 
available for any star based on the limited on-board facilities. 

The Gaia project has therefore created significant activity 
also as concerns ground-based spectroscopic follow-up. A re- 
cent outcome of that is the approval of the Gaia-ESO Survey 
proposal, which has been given 300 nights on VLT (Gilmore 
|etal.|2012| . In Europe several studies are under way for massive 
ground based follow-up of Gaia including both low- and high- 
resolution spectra. The designs include mul tiplexes of up to 3000 
fibres over field-of-views of up to 5 deg^ ( de Jong et al.||2012 



ICirasuolo et"aL][25T2l [Balcells et al.|2010| l. A number of other 
projects are currently under way and will also contribute rele- 
vant data to complement Gaia, even though they were not always 



designed with Gaia in mind. Examples include the on-going 
APOGEE, which will observe about 100000 giant stars down 
to H - 12.5 at high resolution in the near-infrared in the Bulge 
and Milky Way disk ( [Wilson et al.|2010| l, and LAMOST, which 
will cover large fractions of the Northern sky and especially the 
anti-center direction ( Cui et ar][2012[ l. Of particular interest to 
Gaia and to the European efforts is the GALAH survey, which 
will use the high-resolution optical multi-object HERMES spec- 
trograph at AAT to do a large survey down to V = 14 (Heijmans 
et al.|2012j ). The promise of elemental abundances for hundreds 
of thousands to millions of stars across all major components of 
the Galaxy, spread over much larger distances than ever before, 
is very exciting. Here we have investigated which types of sub- 
structures in abundance space that could be distinguished with 
these observations. 

Clearly the arguments presented in Sect. [3] show that it 
is mandatory to strive for the best possible precision in the 
abundance measurements in order to detect stellar popula- 
tions that differ in their elemental abundances from each other. 
Equation ^ gives an estimate of the number of stars needed to 
detect sub-structures in abundance space when the precision is 
known and can be used as a tool for trade-offs between number 
statistics and precision when planning large surveys. 
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